On Feature Selection, Bias-Variance, and Bagging
نویسندگان
چکیده
We examine the mechanism by which feature selection improves the accuracy of supervised learning. An empirical bias/variance analysis as feature selection progresses indicates that the most accurate feature set corresponds to the best bias-variance trade-off point for the learning algorithm. Often, this is not the point separating relevant from irrelevant features, but where increasing variance outweighs the gains from adding more (weakly) relevant features. In other words, feature selection can be viewed as a variance reduction method that trades off the benefits of decreased variance (from the reduction in dimensionality) with the harm of increased bias (from eliminating some of the relevant features). If a variance reduction method like bagging is used, more (weakly) relevant features can be exploited and the most accurate feature set is usually larger. In many cases, the best performance is obtained by using all available features.
منابع مشابه
An Application of Low Bias Bagged SVMs to the Classification of Heterogeneous Malignant Tissues
DNA microarray data are characterized by high-dimensional and low-sized samples, as only few tens of DNA microarray experiments, involving each one thousands of genes, are usually available for data processing. Considering also the large biological variability of gene expression and the noise introduced by the bio-technological machinery, we need robust and variance-reducing data analysis metho...
متن کاملCombining Bias and Variance Reduction Techniques for Regression Trees
Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches — ...
متن کاملDynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation
In this paper the performance of bagging in classification problems is theoretically analysed, using a framework developed in works by Tumer and Ghosh and extended by the authors. A bias-variance decomposition is derived, which relates the expected misclassification probability attained by linearly combining classifiers trained on N bootstrap replicates of a fixed training set to that attained ...
متن کاملLow Bias Bagged Support Vector Machines
Theoretical and experimental analyses of bagging indicate that it is primarily a variance reduction technique. This suggests that bagging should be applied to learning algorithms tuned to minimize bias, even at the cost of some increase in variance. We test this idea with Support Vector Machines (SVMs) by employing out-of-bag estimates of bias and variance to tune the SVMs. Experiments indicate...
متن کاملCross-Validated Bagged Learning.
Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, bootstrap aggregating (bagging) has been introduced as a method to reduce the variance of...
متن کامل